Filled-pause Modeling for Medical Transcriptions
نویسندگان
چکیده
We present our recent progress in filled pause (FP) modeling for a highly spontaneous medical transcription task. Our studies confirm that FP modeling is an important topic for spontaneous speech applications, which must be explicitly addressed in acoustic, lexical, and language modeling. We provide a framework for datadriven lexical modeling of FP acoustic variability with respect to phonemic realization and duration. By using a number of properly weighted FP pronunciation variants of variable lengths and applying specific acoustic models for FP, we achieved an 8% relative reduction of the word error rate. We also tested different approaches for handling FP in the language model and integrating FP into the decoder. Best results with respect to both perplexity and word error rate have been achieved by predicting FP probabilistically and removing it from the language model history. This approach reduces the perplexity by 4% and provides a further gain in word accuracy.
منابع مشابه
Pronunciation Variants Modeling in Korean Spontaneous Speech Recognition
Pronunciation variants in spontaneous speech tend to be more variable in planned speech. Spontaneous speech has significant sources of variations as well as serious phonological variations, which make recognition extremely difficult. In this paper, we analyzed the auditory transcriptions of the dialogue for spontaneous speech recognition, and then classified the characteristics of conversationa...
متن کاملFilled Pause Modeling
This document presents a streamlined approach to modeling filled pause distribution in spontaneous speech and populating a large clean corpus, making use of only the SRILM toolkit and a small training set. Although used for filled pause modeling, it can be fairly general and may be used to model other types of disfluencies, punctuation or sentence boundaries, with a minimal set of changes.
متن کاملAcoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition
Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses “ah,” “ung,” “um,” “em,” and “h...
متن کاملProsodic Cues and Answer Type Detection for the Deception Sub-Challenge
Deception is a deliberate act to deceive interlocutor by transmitting a message containing false or misleading information. Detection of deception consists in the search for reliable differences between liars and truth-tellers. In this paper, we used the Deceptive Speech Database (DSD) provided for the Deception sub-challenge. DSD consists of deceptive and non-deceptive answers to a set of unkn...
متن کاملFilled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown tha...
متن کامل